MEDB 5501, Module08

2024-10-09

Topics to be covered

  • What you will learn
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment

History of R

Break #1

  • What you have learned
    • History of R
  • What’s coming next
    • Installing R

Special note

  • This slide show was created using R.
    • Not complicated
    • But beyond scope of this class
  • Source
    • https://github.com/pmean/introduction-to-r/tree/master/part1/src
  • A second resource
    • http://blog.pmean.com/powerpoint-with-r-markdown/

Installing R (https://cran.r-project.org/)

Screenshot of webpage for installation of R

Installing RStudio (https://rstudio.com/)

Screenshot of main page for RStudio

Installing RStudio (https://rstudio.com/)

Screenshot of products of RStudio

Installing R and R Studio

  • R is required
  • RStudio is strongly recommended
  • Do not delay in getting this software installed
  • Find me if you have ANY problems

“A place for everything, everything in its place”

  • data
    • raw/intermediate data files
  • doc
    • documentation
  • images
    • graphs
  • results
    • program output
  • src
    • program code

Break #2

  • What you have learned
    • Installing R
  • What’s coming next
    • Objects in R

Introduction

This is a very brief introduction to the basic objects in R.

[1] "R version 4.3.0 (2023-04-21 ucrt)"
[1] "2024-12-28"

Functions

[1] 1.732051
[1] 1.000000 1.414214 1.732051 2.000000 2.236068

Nested functions and pipes

[1] 1.249046
[1] 1.249046

Named arguments in functions

[1] 134.8952
[1] 134.8952
[1] 2.326348

Scalars

[1] 3
[1] "R"
[1] "3"

Vectors

[1] 1 2 3
[1] "a" "b" "c"
[1] "a" "2"

Naming vectors

  BA   MS  PhD 
1977 1978 1982 
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

Matrices using cbind and rbind functions

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Matrices using the matrix function

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists

[[1]]
[1] 3

[[2]]
[1] "a" "b" "c"

[[3]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists using names

$name
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

$degrees
  BA   MS  PhD 
1977 1978 1982 

$age
[1] 64

Data frames

  vector_example_1 vector_example_2
1                1                a
2                2                b
3                3                c

Naming data frame columns

  c.1..2..3. c..a....b....c..
1          1                a
2          2                b
3          3                c
  small_numbers early_letters
1             1             a
2             2             b
3             3             c

Tibbles

# A tibble: 3 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     2 b    
3     3 c    

Break #3

  • What you have learned
    • Objects in R
  • What’s coming next
    • Anatomy of a small R program

Anatomy of a small R program, overview

YAML header

---
title: "Illustrating the structure of an R program"
editor: source
format: 
  html:
    embed-resources: true
execute: 
  error: true
---

First comment


This program was written by Steve Simon  and created on 2019-01-28 with a major
revision on 2024-12-27. It is used to illustrate the structure of an R program. 
This program is in the public domain. You can use it any way that you please.

First code chunk

```{r}
#| label: setup
#| message: false
#| warning: false

R.version.string
Sys.Date()
library(tidyverse)
```

Second comment


Read data from the aids-cases text file. This file is described at

https://github.com/pmean/data/blob/main/files/aids-cases.yaml

Second code chunk

```{r}
#| label: read-text-file

aids_cases <- read_csv(
  file="../data/aids-cases.csv",
  col_types="nnn")
glimpse(aids_cases)
```

Third comment


This is a small dataset with only three variables. Now let's draw a line graph.

Third code chunk

```{r}
#| label: line-graph

aids_cases |>
  ggplot() +
    aes(yr, nsw) +
    geom_line()
```

Fourth comment


There is an increasing trend in aids cases in New South Wales over time.

Anatomy of a small program, review

Output, overview

Output, part 1

Output, part 2

Output, part 3

Break #4

  • What you have learned
    • Anatomy of a small R program
  • What’s coming next
    • Live demonstration

Live demonstration of running R

In this segment, you will see a live demonstration running the program simon-5505-01-template.qmd.

Break #5

  • What you have learned
    • Live demonstration
  • What’s coming next
    • Good programming practices

General requirements for any program

There are standards in six areas:

  • Documentation
  • Graphs
  • Tables
  • Readability
  • Interpretation
  • Conciseness

There may be times when one or two of these standards do not apply. Which standards apply and which don’t should be obvious from the nature of the programming assignment.

Documentation is required!

Documentation should include

  • the name of the author (you!),
  • the creation date,
  • the purpose of your program, and
  • any restrictions on use (your choice).
    • Public domain (no restrictions)
    • Specific restrictions on how others can use your program

Graphs cannot rely on default choices, 1

Always modify your graphs. Do not settle for the default options.

  • Include your name and date on the title of any graph
    • “Steve Simon produced this graph on 2023-09-19.”
  • Avoid the display of unnecessary decimal places on the axes
  • Use comma separators for large numbers
  • Replace category codes with descriptive labels

Graphs cannot rely on default choices, 2

  • Replace short variable names with longer descriptors
    • Include units of measurement, if needed
  • Avoid the gratuitous use of color
    • Unless needed to distinguish between groups
    • Fill boxes and points with white/transparent colors

Tables also need modification

  • Round to two or three significant figures
  • Use comma separators if numbers are >= 1,000
  • Avoid scientific notation (e.g., 1.23E-04)
  • Avoid small p-values (e.g., p=0.000)
    • Change to p<0.001
  • Suppress the printing of unneeded tables
    • Sometimes difficult

Sometimes default tables/graphs acceptable

  • Early assignments may ask for defaults
  • Always round and specify units in your interpretations

Your code must be easy to read

  • Make liberal use of
    • blank lines
    • line breaks
    • indenting
    • vertical lists

Always include an interpretation

  • Use simple evaluative words
    • Young/Elderly
    • Less than half/more than half
    • Almost all/almost none
    • Substantial improvement/roughly comparable
  • Depends on context
    • No penalty for subjective judgments

Conciseness

  • Do not include analyses that were not asked for
  • Avoid displaying excessively large tables
    • This may be difficult for SAS and SPSS

Data dictionary

If you include a data set that you found on your own rather than one that your instructors provided, you must include a data dictionary. The elements of a data dictionary should include:

  • Source
  • Description
  • Copyright
  • Size
  • Variables

Data dictionary: source

  • Where did you find the data
    • Website link
    • Formal reference (if available)

Include a complete URL, except if your data is behind a paywall. If your data is associated with a peer-reviewed publication, provide a formal reference to that publication.

Data dictionary: Description

Provide a few sentences explaining the context of your data. Explain how the data was collected and what it is being used for.

Data dictionary: Size

  • Number of rows (excluding a header row)
  • Number of columns

Data dictionary: Variables

  • Name
  • Label
  • Units of measure

Data dictionary: Variable scale

  • Scale
    • Nominal
    • Ordinal
    • Interval
    • Ratio

Data dictionary: Variable range

  • Range
    • Non-negative (>= 0)
    • Positive (> 0)
    • Upper bound, if any

Data dictionary: Variable type

  • Type
    • Integer
    • Float
    • Character

File details

This file was written by Steve Simon on 2024-12-26. It is in the public domain and you can use it any way you please.

Break #6

  • What you have learned
    • Good programming practices
  • What’s coming next
    • Your programming assignment

Program

  • Download the program template
    • Store it in your src folder
  • Modify the file name
    • Use your last name instead of “simon”
    • Change “template” to “aids-cases”
  • Modify the documentation headers
    • Add your name
    • Optional: change the copyright statement

Data

Question 1

Calculate the minimum and maximun number of AIDS cases in Victoria from 1982 to 1987. The default format for this table is acceptable. Provide a brief interpretation.

Question 2

Graph the trend in AIDS cases in Victoria from 1982 to 1987. Use a nice format and provide a brief interpretation.

Grading rubric

You will be evaluated using the general grading rubric for programming assignments.

Your submission

  • Save the output in html format
  • Convert it to pdf format.
  • Make sure that the pdf file includes
    • Your last name
    • The number of this course
    • The number of this module
  • Upload the file

If it doesn’t work

Please review the suggestions if you encounter an error page.

File details

This programming assignment was written by Steve Simon on 2024-12-18 and is placed in the public domain.

Summary

  • What you have learned
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment